This document contains the results of the statistical analysis for … project. According to the study Protocol, the following analysis should be performed:
The relationship between outcome and independent variables such as age, height and weight, ethnicity and BMI will be examined with regression models.
For each relationship of outcome and the independent variables, the linear and transformed (i.e., quadratic, power, log).
The following variables have been collected during the visit and entered into the REDCap system:
Data sets used:
The data set contains data for n = 400 participants marked as “randomized”.
The Table 1 contains the demographics characteristics for all randomized participants.
| Table 1 | |||
| Variable | Female | Male | Overall |
|---|---|---|---|
| Age | |||
| Count | 153 | 247 | 400 |
| Mean (SD) | 10.91 (3.30) | 10.34 (3.55) | 10.56 (3.46) |
| Median (IQR) | 11.19 (4.34) | 10.48 (5.11) | 10.69 (4.72) |
| Q1, Q3 | 8.64, 12.98 | 7.78, 12.89 | 8.23, 12.95 |
| Min, Max | 0.87, 19.91 | 0.20, 19.34 | 0.20, 19.91 |
| Missing | 0 | 0 | 0 |
| Height | |||
| Count | 153 | 247 | 400 |
| Mean (SD) | 143.78 (19.97) | 140.97 (21.34) | 142.05 (20.84) |
| Median (IQR) | 143.95 (27.10) | 142.95 (29.58) | 143.00 (28.78) |
| Q1, Q3 | 130.48, 157.58 | 127.59, 157.17 | 128.52, 157.30 |
| Min, Max | 86.12, 191.91 | 87.27, 194.77 | 86.12, 194.77 |
| Missing | 0 | 0 | 0 |
| Weight | |||
| Count | 153 | 247 | 400 |
| Mean (SD) | 38.50 (10.86) | 36.04 (12.19) | 36.98 (11.75) |
| Median (IQR) | 39.99 (14.80) | 36.77 (16.14) | 37.58 (15.60) |
| Q1, Q3 | 30.95, 45.76 | 28.38, 44.52 | 29.28, 44.88 |
| Min, Max | 10.65, 67.66 | -5.48, 68.94 | -5.48, 68.94 |
| Missing | 0 | 0 | 0 |
| Outcome | |||
| Count | 153 | 247 | 400 |
| Mean (SD) | 0.52 (0.22) | 0.55 (0.23) | 0.54 (0.22) |
| Median (IQR) | 0.49 (0.31) | 0.55 (0.31) | 0.54 (0.31) |
| Q1, Q3 | 0.37, 0.68 | 0.40, 0.71 | 0.38, 0.70 |
| Min, Max | -0.05, 1.00 | -0.09, 1.10 | -0.09, 1.10 |
| Missing | 0 | 0 | 0 |
| Gender | |||
| Count (%) | 153 (38.25%) | 247 (61.75%) | 400 |
| (Col %) | |||
| Female | 153 (100.00%) | 0 ( 0.00%) | 153 (38.25%) |
| Male | 0 ( 0.00%) | 247 (100.00%) | 247 (61.75%) |
| Missing | 0 | 0 | 0 |
| Ethnicity | |||
| Count (%) | 153 (38.25%) | 247 (61.75%) | 400 |
| (Row %) | |||
| Caucasian | 110 (38.19%) | 178 (61.81%) | 288 (100.00%) |
| Other | 43 (38.39%) | 69 (61.61%) | 112 (100.00%) |
| Missing | 0 | 0 | 0 |
| BMI | |||
| Count | 153 | 247 | 400 |
| Mean (SD) | 18.24 (1.98) | 17.45 (2.86) | 17.76 (2.58) |
| Median (IQR) | 18.22 (2.52) | 17.69 (2.79) | 17.89 (2.57) |
| Q1, Q3 | 17.01, 19.53 | 16.44, 19.23 | 16.72, 19.29 |
| Min, Max | 11.52, 23.84 | -7.19, 23.59 | -7.19, 23.84 |
| Missing | 0 | 0 | 0 |
| all randomized participants | |||
Dependent variables: All dependent variables are continuous.
Independent variables: can be continuous or categorical
• Age: continuous
• Weight: continuous
• Height: continuous
• BMI: continuous
• Sex: dichotomous categorical
• Ethnicity: categorical
The data set to be verified for the absence of Multicollinearity between the independent variables. Correlation analysis to be performed and well as the VIF (Variance Inflation Factor) to be explored.
Each histogram shows the distribution of the Independent variable by sex, the distribution of the whole dataset is shown in grey color at background. Visual examination aims to help to identify the possible outliers/extreme values and if any transformation can be applied.
Shapiro-Wilk normality test was performed for all Independent variables to examine if the variable follows the Normal distribution.
| Variable | Statistic | P-value | |
|---|---|---|---|
| W | Age | 0.9965 | 0.52405 |
| W | Height stand | 0.9946 | 0.17119 |
| W | Weight | 0.9942 | 0.13656 |
| W | Outcome | 0.9949 | 0.21135 |
Consider the association between outcome values by Gender.
| .y. | group1 | group2 | p | p.adj | p.format | p.signif | method |
|---|---|---|---|---|---|---|---|
| Outcome | Female | Male | 0.161 | 0.16 | 0.16 | ns | Wilcoxon |
| .y. | group1 | group2 | p | p.adj | p.format | p.signif | method |
|---|---|---|---|---|---|---|---|
| Outcome | Caucasian | Other | 0.966 | 0.97 | 0.97 | ns | Wilcoxon |
Examining the association of outcome variables with independent variables stratified by sex.
Examining the association (bivariate and multivariable) between IOS variables and IV. The main aim is to examine significant bivariate associations and taking into consideration high correlation between IV, to select the best candidates for the final equation avoiding multicollinearity.
Outcome ~ lm(Age + Gender + Caucasian + Height_stand + Weight + bmi)
| Dependent: Outcome | Coefficient (univariable) | Coefficient (multivariable) | ||
|---|---|---|---|---|
| 1 | Age | [0.2,19.9] | -0.053 (-0.057 to -0.050, p<0.001) | -0.041 (-0.053 to -0.030, p<0.001) |
| 5 | Gender | Female |
|
|
| 6 | Male | 0.031 (-0.014 to 0.077, p=0.173) | 0.005 (-0.021 to 0.031, p=0.696) | |
| 3 | Ethnicity | Caucasian |
|
|
| 4 | Other | 0.002 (-0.047 to 0.052, p=0.921) | -0.002 (-0.030 to 0.025, p=0.870) | |
| 7 | Height | [86.1,194.8] | -0.009 (-0.009 to -0.008, p<0.001) | -0.002 (-0.005 to 0.002, p=0.289) |
| 8 | Weight | [-5.5,68.9] | -0.014 (-0.016 to -0.013, p<0.001) | -0.001 (-0.007 to 0.005, p=0.759) |
| 2 | BMI | [-7.2,23.8] | -0.019 (-0.027 to -0.010, p<0.001) | 0.006 (-0.003 to 0.016, p=0.186) |
| Number in dataframe = 400, Number in model = 400, Missing = 0, Log-likelihood = 266.3, AIC = -516.6, R-squared = 0.69, Adjusted R-squared = 0.69 |
| VIF | |
|---|---|
| Age | 10.27 |
| Gender | 1.02 |
| Ethnicity | 1.01 |
| Height | 33.12 |
| Weight | 34.23 |
| BMI | 3.92 |
Let’s start with Outcome1 model.
As visual exam of association of Outcome with Independent variables suggest linear association between Outcome and Age, Height, the following statistics to be reported:
|
|
Verification of the linear association between the IOS variables and Independent variables.
The residuals error (in red color) between observed values and the fitted regression line. Each vertical red segments represents the residual error between an observed Outcome values and the corresponding predicted (i.e. fitted) value.
The red line is approximately horizontal at zero, indicating a little pattern in the residuals…
The QQ plot of residuals can be used to visually check the normality assumption. The normal probability plot of residuals should approximately follow a straight line. In our example, minimum deviation is observed along the reference line at the …., other plots could suggest that the assumption of normality of residuals is violated.
Checking the homogeneity of variance of the residuals (homoscedasticity), we have to verify if the points are equally spread around the horizontal line - which is observed in Age (no transformation) and Height_stand (no transformation).
Based on the previous results, the following models to be compared for IOS variables of interest.
| Dependent: Outcome | Coefficient (univariable) | Coefficient (multivariable) | |
|---|---|---|---|
| Model 3 | |||
| Age | [0.2,19.9] | -0.053 (-0.057 to -0.050, p<0.001) | -0.040 (-0.051 to -0.029, p<0.001) |
| Height | [86.1,194.8] | -0.009 (-0.009 to -0.008, p<0.001) | -0.002 (-0.004 to -0.000, p=0.014) |
| Model 4 | |||
| Age | [0.2,19.9] | -0.053 (-0.057 to -0.050, p<0.001) | -0.055 (-0.058 to -0.051, p<0.001) |
| BMI | [-7.2,23.8] | -0.019 (-0.027 to -0.010, p<0.001) | 0.005 (0.000 to 0.010, p=0.044) |
| Model 5 | |||
| Height | [86.1,194.8] | -0.009 (-0.009 to -0.008, p<0.001) | -0.009 (-0.009 to -0.008, p<0.001) |
| Model 6 | |||
| Age | [0.2,19.9] | -0.053 (-0.057 to -0.050, p<0.001) | -0.053 (-0.057 to -0.050, p<0.001) |